The search functionality is under construction.

Keyword Search Result

[Keyword] deep learning(149hit)

101-120hit(149hit)

  • Software Development Effort Estimation from Unstructured Software Project Description by Sequence Models

    Tachanun KANGWANTRAKOOL  Kobkrit VIRIYAYUDHAKORN  Thanaruk THEERAMUNKONG  

     
    PAPER

      Pubricized:
    2020/01/14
      Vol:
    E103-D No:4
      Page(s):
    739-747

    Most existing methods of effort estimations in software development are manual, labor-intensive and subjective, resulting in overestimation with bidding fail, and underestimation with money loss. This paper investigates effectiveness of sequence models on estimating development effort, in the form of man-months, from software project data. Four architectures; (1) Average word-vector with Multi-layer Perceptron (MLP), (2) Average word-vector with Support Vector Regression (SVR), (3) Gated Recurrent Unit (GRU) sequence model, and (4) Long short-term memory (LSTM) sequence model are compared in terms of man-months difference. The approach is evaluated using two datasets; ISEM (1,573 English software project descriptions) and ISBSG (9,100 software projects data), where the former is a raw text and the latter is a structured data table explained the characteristic of a software project. The LSTM sequence model achieves the lowest and the second lowest mean absolute errors, which are 0.705 and 14.077 man-months for ISEM and ISBSG datasets respectively. The MLP model achieves the lowest mean absolute errors which is 14.069 for ISBSG datasets.

  • Evaluating Deep Learning for Image Classification in Adversarial Environment

    Ye PENG  Wentao ZHAO  Wei CAI  Jinshu SU  Biao HAN  Qiang LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/12/23
      Vol:
    E103-D No:4
      Page(s):
    825-837

    Due to the superior performance, deep learning has been widely applied to various applications, including image classification, bioinformatics, and cybersecurity. Nevertheless, the research investigations on deep learning in the adversarial environment are still on their preliminary stage. The emerging adversarial learning methods, e.g., generative adversarial networks, have introduced two vital questions: to what degree the security of deep learning with the presence of adversarial examples is; how to evaluate the performance of deep learning models in adversarial environment, thus, to raise security advice such that the selected application system based on deep learning is resistant to adversarial examples. To see the answers, we leverage image classification as an example application scenario to propose a framework of Evaluating Deep Learning for Image Classification (EDLIC) to conduct comprehensively quantitative analysis. Moreover, we introduce a set of evaluating metrics to measure the performance of different attacking and defensive techniques. After that, we conduct extensive experiments towards the performance of deep learning for image classification under different adversarial environments to validate the scalability of EDLIC. Finally, we give some advice about the selection of deep learning models for image classification based on these comparative results.

  • Mal2d: 2d Based Deep Learning Model for Malware Detection Using Black and White Binary Image

    Minkyoung CHO  Jik-Soo KIM  Jongho SHIN  Incheol SHIN  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/12/25
      Vol:
    E103-D No:4
      Page(s):
    896-900

    We propose an effective 2d image based end-to-end deep learning model for malware detection by introducing a black & white embedding to reserve bit information and adapting the convolution architecture. Experimental results show that our proposed scheme can achieve superior performance in both of training and testing data sets compared to well-known image recognition deep learning models (VGG and ResNet).

  • Edge-SiamNet and Edge-TripleNet: New Deep Learning Models for Handwritten Numeral Recognition

    Weiwei JIANG  Le ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2019/12/09
      Vol:
    E103-D No:3
      Page(s):
    720-723

    Handwritten numeral recognition is a classical and important task in the computer vision area. We propose two novel deep learning models for this task, which combine the edge extraction method and Siamese/Triple network structures. We evaluate the models on seven handwritten numeral datasets and the results demonstrate both the simplicity and effectiveness of our models, comparing to baseline methods.

  • Combining CNN and Broad Learning for Music Classification

    Huan TANG  Ning CHEN  

     
    PAPER-Music Information Processing

      Pubricized:
    2019/12/05
      Vol:
    E103-D No:3
      Page(s):
    695-701

    Music classification has been inspired by the remarkable success of deep learning. To enhance efficiency and ensure high performance at the same time, a hybrid architecture that combines deep learning and Broad Learning (BL) is proposed for music classification tasks. At the feature extraction stage, the Random CNN (RCNN) is adopted to analyze the Mel-spectrogram of the input music sound. Compared with conventional CNN, RCNN has more flexible structure to adapt to the variance contained in different types of music. At the prediction stage, the BL technique is introduced to enhance the prediction accuracy and reduce the training time as well. Experimental results on three benchmark datasets (GTZAN, Ballroom, and Emotion) demonstrate that: i) The proposed scheme achieves higher classification accuracy than the deep learning based one, which combines CNN and LSTM, on all three benchmark datasets. ii) Both RCNN and BL contribute to the performance improvement of the proposed scheme. iii) The introduction of BL also helps to enhance the prediction efficiency of the proposed scheme.

  • Fast Inference of Binarized Convolutional Neural Networks Exploiting Max Pooling with Modified Block Structure

    Ji-Hoon SHIN  Tae-Hwan KIM  

     
    LETTER-Software System

      Pubricized:
    2019/12/03
      Vol:
    E103-D No:3
      Page(s):
    706-710

    This letter presents a novel technique to achieve a fast inference of the binarized convolutional neural networks (BCNN). The proposed technique modifies the structure of the constituent blocks of the BCNN model so that the input elements for the max-pooling operation are binary. In this structure, if any of the input elements is +1, the result of the pooling can be produced immediately; the proposed technique eliminates such computations that are involved to obtain the remaining input elements, so as to reduce the inference time effectively. The proposed technique reduces the inference time by up to 34.11%, while maintaining the classification accuracy.

  • A Non-Intrusive Speech Intelligibility Estimation Method Based on Deep Learning Using Autoencoder Features

    Yoonhee KIM  Deokgyu YUN  Hannah LEE  Seung Ho CHOI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/12/11
      Vol:
    E103-D No:3
      Page(s):
    714-715

    This paper presents a deep learning-based non-intrusive speech intelligibility estimation method using bottleneck features of autoencoder. The conventional standard non-intrusive speech intelligibility estimation method, P.563, lacks intelligibility estimation performance in various noise environments. We propose a more accurate speech intelligibility estimation method based on long-short term memory (LSTM) neural network whose input and output are an autoencoder bottleneck features and a short-time objective intelligence (STOI) score, respectively, where STOI is a standard tool for measuring intrusive speech intelligibility with reference speech signals. We showed that the proposed method has a superior performance by comparing with the conventional standard P.563 and mel-frequency cepstral coefficient (MFCC) feature-based intelligibility estimation methods for speech signals in various noise environments.

  • Broadband Direction of Arrival Estimation Based on Convolutional Neural Network Open Access

    Wenli ZHU  Min ZHANG  Chenxi WU  Lingqing ZENG  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2019/08/27
      Vol:
    E103-B No:3
      Page(s):
    148-154

    A convolutional neural network (CNN) for broadband direction of arrival (DOA) estimation of far-field electromagnetic signals is presented. The proposed algorithm performs a nonlinear inverse mapping from received signal to angle of arrival. The signal model used for algorithm is based on the circular antenna array geometry, and the phase component extracted from the spatial covariance matrix is used as the input of the CNN network. A CNN model including three convolutional layers is then established to approximate the nonlinear mapping. The performance of the CNN model is evaluated in a noisy environment for various values of signal-to-noise ratio (SNR). The results demonstrate that the proposed CNN model with the phase component of the spatial covariance matrix as the input is able to achieve fast and accurate broadband DOA estimation and attains perfect performance at lower SNR values.

  • Real-Time Generic Object Tracking via Recurrent Regression Network

    Rui CHEN  Ying TONG  Ruiyu LIANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/12/20
      Vol:
    E103-D No:3
      Page(s):
    602-611

    Deep neural networks have achieved great success in visual tracking by learning a generic representation and leveraging large amounts of training data to improve performance. Most generic object trackers are trained from scratch online and do not benefit from a large number of videos available for offline training. We present a real-time generic object tracker capable of incorporating temporal information into its model, learning from many examples offline and quickly updating online. During the training process, the pre-trained weight of convolution layer is updated lagging behind, and the input video sequence length is gradually increased for fast convergence. Furthermore, only the hidden states in recurrent network are updated to guarantee the real-time tracking speed. The experimental results show that the proposed tracking method is capable of tracking objects at 150 fps with higher predicting overlap rate, and achieves more robustness in multiple benchmarks than state-of-the-art performance.

  • Simple Black-Box Adversarial Examples Generation with Very Few Queries

    Yuya SENZAKI  Satsuya OHATA  Kanta MATSUURA  

     
    PAPER-Reliability and Security of Computer Systems

      Pubricized:
    2019/10/02
      Vol:
    E103-D No:2
      Page(s):
    212-221

    Research on adversarial examples for machine learning has received much attention in recent years. Most of previous approaches are white-box attacks; this means the attacker needs to obtain before-hand internal parameters of a target classifier to generate adversarial examples for it. This condition is hard to satisfy in practice. There is also research on black-box attacks, in which the attacker can only obtain partial information about target classifiers; however, it seems we can prevent these attacks, since they need to issue many suspicious queries to the target classifier. In this paper, we show that a naive defense strategy based on surveillance of number query will not suffice. More concretely, we propose to generate not pixel-wise but block-wise adversarial perturbations to reduce the number of queries. Our experiments show that such rough perturbations can confuse the target classifier. We succeed in reducing the number of queries to generate adversarial examples in most cases. Our simple method is an untargeted attack and may have low success rates compared to previous results of other black-box attacks, but needs in average fewer queries. Surprisingly, the minimum number of queries (one and three in MNIST and CIFAR-10 dataset, respectively) is enough to generate adversarial examples in some cases. Moreover, based on these results, we propose a detailed classification for black-box attackers and discuss countermeasures against the above attacks.

  • Recurrent Neural Network Compression Based on Low-Rank Tensor Representation

    Andros TJANDRA  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Music Information Processing

      Pubricized:
    2019/10/17
      Vol:
    E103-D No:2
      Page(s):
    435-449

    Recurrent Neural Network (RNN) has achieved many state-of-the-art performances on various complex tasks related to the temporal and sequential data. But most of these RNNs require much computational power and a huge number of parameters for both training and inference stage. Several tensor decomposition methods are included such as CANDECOMP/PARAFAC (CP), Tucker decomposition and Tensor Train (TT) to re-parameterize the Gated Recurrent Unit (GRU) RNN. First, we evaluate all tensor-based RNNs performance on sequence modeling tasks with a various number of parameters. Based on our experiment results, TT-GRU achieved the best results in a various number of parameters compared to other decomposition methods. Later, we evaluate our proposed TT-GRU with speech recognition task. We compressed the bidirectional GRU layers inside DeepSpeech2 architecture. Based on our experiment result, our proposed TT-format GRU are able to preserve the performance while reducing the number of GRU parameters significantly compared to the uncompressed GRU.

  • On Performance of Deep Learning for Harmonic Spur Cancellation in OFDM Systems

    Ziming HE  

     
    LETTER-Mobile Information Network and Personal Communications

      Vol:
    E103-A No:2
      Page(s):
    576-579

    In this letter, the performance of a state-of-the-art deep learning (DL) algorithm in [5] is analyzed and evaluated for orthogonal frequency-division multiplexing (OFDM) receivers, in the presence of harmonic spur interference. Moreover, a novel spur cancellation receiver structure and algorithm are proposed to enhance the traditional OFDM receivers, and serve as a performance benchmark for the DL algorithm. It is found that the DL algorithm outperforms the traditional algorithm and is much more robust to spur carrier frequency offset.

  • CsiNet-Plus Model with Truncation and Noise on CSI Feedback Open Access

    Feng LIU  Xuecheng HE  Conggai LI  Yanli XU  

     
    LETTER-Communication Theory and Signals

      Vol:
    E103-A No:1
      Page(s):
    376-381

    For the frequency-division-duplex (FDD)-based massive multiple-input multiple-output (MIMO) systems, channel state information (CSI) feedback plays a critical role. Although deep learning has been used to compress the CSI feedback, some issues like truncation and noise still need further investigation. Facing these practical concerns, we propose an improved model (called CsiNet-Plus), which includes a truncation process and a channel noise process. Simulation results demonstrate that the CsiNet-Plus outperforms the existing CsiNet. The performance interchangeability between truncated decimal digits and the signal-to-noise-ratio helps support flexible configuration.

  • On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework

    Xingyu ZHANG  Xia ZOU  Meng SUN  Penglong WU  Yimin WANG  Jun HE  

     
    LETTER-Speech and Hearing

      Vol:
    E103-A No:1
      Page(s):
    356-360

    In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.

  • Detecting Surface Defects of Wind Tubine Blades Using an Alexnet Deep Learning Algorithm Open Access

    Xiao-Yi ZHAO  Chao-Yi DONG  Peng ZHOU  Mei-Jia ZHU  Jing-Wen REN  Xiao-Yan CHEN  

     
    PAPER-Machine Learning

      Vol:
    E102-A No:12
      Page(s):
    1817-1824

    The paper employed an Alexnet, which is a deep learning framework, to automatically diagnose the damages of wind power generator blade surfaces. The original images of wind power generator blade surfaces were captured by machine visions of a 4-rotor UAV (unmanned aerial vehicle). Firstly, an 8-layer Alexnet, totally including 21 functional sub-layers, is constructed and parameterized. Secondly, the Alexnet was trained with 10000 images and then was tested by 6-turn 350 images. Finally, the statistic of network tests shows that the average accuracy of damage diagnosis by Alexnet is about 99.001%. We also trained and tested a traditional BP (Back Propagation) neural network, which have 20-neuron input layer, 5-neuron hidden layer, and 1-neuron output layer, with the same image data. The average accuracy of damage diagnosis of BP neural network is 19.424% lower than that of Alexnet. The point shows that it is feasible to apply the UAV image acquisition and the deep learning classifier to diagnose the damages of wind turbine blades in service automatically.

  • Attentive Sequences Recurrent Network for Social Relation Recognition from Video Open Access

    Jinna LV  Bin WU  Yunlei ZHANG  Yunpeng XIAO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2019/09/02
      Vol:
    E102-D No:12
      Page(s):
    2568-2576

    Recently, social relation analysis receives an increasing amount of attention from text to image data. However, social relation analysis from video is an important problem, which is lacking in the current literature. There are still some challenges: 1) it is hard to learn a satisfactory mapping function from low-level pixels to high-level social relation space; 2) how to efficiently select the most relevant information from noisy and unsegmented video. In this paper, we present an Attentive Sequences Recurrent Network model, called ASRN, to deal with the above challenges. First, in order to explore multiple clues, we design a Multiple Feature Attention (MFA) mechanism to fuse multiple visual features (i.e. image, motion, body, and face). Through this manner, we can generate an appropriate mapping function from low-level video pixels to high-level social relation space. Second, we design a sequence recurrent network based on Global and Local Attention (GLA) mechanism. Specially, an attention mechanism is used in GLA to integrate global feature with local sequence feature to select more relevant sequences for the recognition task. Therefore, the GLA module can better deal with noisy and unsegmented video. At last, extensive experiments on the SRIV dataset demonstrate the performance of our ASRN model.

  • Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition

    Dichao LIU  Yu WANG  Jien KATO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2019/09/04
      Vol:
    E102-D No:12
      Page(s):
    2577-2586

    The aim of this paper is to propose effective attentional regions for fine-grained visual recognition. Based on the Spatial Transformers' capability of spatial manipulation within networks, we propose an extension model, the Attention-Guided Spatial Transformer Networks (AG-STNs). This model can guide the Spatial Transformers with hard-coded attentional regions at first. Then such guidance can be turned off, and the network model will adjust the region learning in terms of the location and scale. Such adjustment is conditioned to the classification loss so that it is actually optimized for better recognition results. With this model, we are able to successfully capture detailed attentional information. Also, the AG-STNs are able to capture attentional information in multiple levels, and different levels of attentional information are complementary to each other in our experiments. A fusion of them brings better results.

  • Tweet Stance Detection Using Multi-Kernel Convolution and Attentive LSTM Variants

    Umme Aymun SIDDIQUA  Abu Nowshed CHY  Masaki AONO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/09/25
      Vol:
    E102-D No:12
      Page(s):
    2493-2503

    Stance detection in twitter aims at mining user stances expressed in a tweet towards a single or multiple target entities. Detecting and analyzing user stances from massive opinion-oriented twitter posts provide enormous opportunities to journalists, governments, companies, and other organizations. Most of the prior studies have explored the traditional deep learning models, e.g., long short-term memory (LSTM) and gated recurrent unit (GRU) for detecting stance in tweets. However, compared to these traditional approaches, recently proposed densely connected bidirectional LSTM and nested LSTMs architectures effectively address the vanishing-gradient and overfitting problems as well as dealing with long-term dependencies. In this paper, we propose a neural network model that adopts the strengths of these two LSTM variants to learn better long-term dependencies, where each module coupled with an attention mechanism that amplifies the contribution of important elements in the final representation. We also employ a multi-kernel convolution on top of them to extract the higher-level tweet representations. Results of extensive experiments on single and multi-target benchmark stance detection datasets show that our proposed method achieves substantial improvement over the current state-of-the-art deep learning based methods.

  • Channel and Frequency Attention Module for Diverse Animal Sound Classification

    Kyungdeuk KO  Jaihyun PARK  David K. HAN  Hanseok KO  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/09/17
      Vol:
    E102-D No:12
      Page(s):
    2615-2618

    In-class species classification based on animal sounds is a highly challenging task even with the latest deep learning technique applied. The difficulty of distinguishing the species is further compounded when the number of species is large within the same class. This paper presents a novel approach for fine categorization of animal species based on their sounds by using pre-trained CNNs and a new self-attention module well-suited for acoustic signals The proposed method is shown effective as it achieves average species accuracy of 98.37% and the minimum species accuracy of 94.38%, the highest among the competing baselines, which include CNN's without self-attention and CNN's with CBAM, FAM, and CFAM but without pre-training.

  • TFIDF-FL: Localizing Faults Using Term Frequency-Inverse Document Frequency and Deep Learning

    Zhuo ZHANG  Yan LEI  Jianjun XU  Xiaoguang MAO  Xi CHANG  

     
    LETTER-Software Engineering

      Pubricized:
    2019/05/27
      Vol:
    E102-D No:9
      Page(s):
    1860-1864

    Existing fault localization based on neural networks utilize the information of whether a statement is executed or not executed to identify suspicious statements potentially responsible for a failure. However, the information just shows the binary execution states of a statement, and cannot show how important a statement is in executions. Consequently, it may degrade fault localization effectiveness. To address this issue, this paper proposes TFIDF-FL by using term frequency-inverse document frequency to identify a high or low degree of the influence of a statement in an execution. Our empirical results on 8 real-world programs show that TFIDF-FL significantly improves fault localization effectiveness.

101-120hit(149hit)